{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NOTE:  THIS NOTEBOOK RUNS FOR 30+ MINUTES.  \n",
    "\n",
    "# PLEASE BE PATIENT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train a Model with SageMaker Autopilot\n",
    "\n",
    "We will use Autopilot to predict the star rating of customer reviews. Autopilot implements a transparent approach to AutoML. \n",
    "\n",
    "For more details on Autopilot, have a look at this [**Amazon Science Publication**](https://www.amazon.science/publications/amazon-sagemaker-autopilot-a-white-box-automl-solution-at-scale).\n",
    "\n",
    "[![](img/amazon_science.png)](https://www.amazon.science/publications/amazon-sagemaker-autopilot-a-white-box-automl-solution-at-scale)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"img/autopilot-transparent.png\" width=\"80%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazon SageMaker Autopilot is a service to perform automated machine learning (AutoML) on your datasets.  Autopilot is available through the UI or AWS SDK.  In this notebook, we will use the AWS SDK to create and deploy a text processing and star rating classification machine learning pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup\n",
    "\n",
    "Let's start by specifying:\n",
    "\n",
    "* The S3 bucket and prefix to use to train our model.  _Note:  This should be in the same region as this notebook._\n",
    "* The IAM role of this notebook needs access to your data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notes\n",
    "* This notebook will take some time to finish. \n",
    "\n",
    "* You can start this notebook and continue to the next notebooks whenever you are waiting for the current notebook to finish."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Checking Pre-Requisites From Previous Notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r autopilot_train_s3_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    autopilot_train_s3_uri\n",
    "    print(\"[OK]\")\n",
    "except NameError:\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\")\n",
    "    print(\"[ERROR] PLEASE RUN THE PREVIOUS 01_PREPARE_DATASET_AUTOPILOT NOTEBOOK.\")\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(autopilot_train_s3_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not autopilot_train_s3_uri:\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\")\n",
    "    print(\"[ERROR] PLEASE RUN THE PREVIOUS 01_PREPARE_DATASET_AUTOPILOT NOTEBOOK.\")\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\")\n",
    "else:\n",
    "    print(\"[OK]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "import pandas as pd\n",
    "import json\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "role = sagemaker.get_execution_role()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name=\"sagemaker\", region_name=region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(autopilot_train_s3_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls $autopilot_train_s3_uri"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## See our prepared training data which we use as input for Autopilot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp $autopilot_train_s3_uri ./tmp/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "df = pd.read_csv(\"./tmp/amazon_reviews_us_Digital_Software_v1_00_autopilot.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup the S3 Location for the Autopilot-Generated Assets \n",
    "This include Jupyter Notebooks (Analysis), Python Scripts (Feature Engineering), and Trained Models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prefix_model_output = \"models/autopilot\"\n",
    "\n",
    "model_output_s3_uri = \"s3://{}/{}\".format(bucket, prefix_model_output)\n",
    "\n",
    "print(model_output_s3_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "max_candidates = 3\n",
    "\n",
    "job_config = {\n",
    "    \"CompletionCriteria\": {\n",
    "        \"MaxRuntimePerTrainingJobInSeconds\": 900,\n",
    "        \"MaxCandidates\": max_candidates,\n",
    "        \"MaxAutoMLJobRuntimeInSeconds\": 5400,\n",
    "    },\n",
    "}\n",
    "\n",
    "input_data_config = [\n",
    "    {\n",
    "        \"DataSource\": {\"S3DataSource\": {\"S3DataType\": \"S3Prefix\", \"S3Uri\": \"{}\".format(autopilot_train_s3_uri)}},\n",
    "        \"TargetAttributeName\": \"star_rating\",\n",
    "    }\n",
    "]\n",
    "\n",
    "output_data_config = {\"S3OutputPath\": \"{}\".format(model_output_s3_uri)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Check For Existing Autopilot Jobs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "existing_jobs_response = sm.list_auto_ml_jobs()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_existing_jobs = 0\n",
    "running_jobs = 0\n",
    "\n",
    "if \"AutoMLJobSummaries\" in existing_jobs_response.keys():\n",
    "    job_list = existing_jobs_response[\"AutoMLJobSummaries\"]\n",
    "    num_existing_jobs = len(job_list)\n",
    "    # print('[INFO] You already created {} Autopilot job(s) in this account.'.format(num_existing_jobs))\n",
    "    for j in job_list:\n",
    "        if \"AutoMLJobStatus\" in j.keys():\n",
    "            if j[\"AutoMLJobStatus\"] == \"InProgress\":\n",
    "                running_jobs = running_jobs + 1\n",
    "    print(\"[INFO] You have {} Autopilot job(s) currently running << Should be 0 jobs.\".format(running_jobs))\n",
    "else:\n",
    "    print(\"[OK] Please continue.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Launch the SageMaker Autopilot Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import gmtime, strftime, sleep"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r auto_ml_job_name\n",
    "\n",
    "try:\n",
    "    auto_ml_job_name\n",
    "except NameError:\n",
    "    timestamp_suffix = strftime(\"%d-%H-%M-%S\", gmtime())\n",
    "    auto_ml_job_name = \"automl-dm-\" + timestamp_suffix\n",
    "    print(\"Created AutoMLJobName: \" + auto_ml_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(auto_ml_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store auto_ml_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "max_running_jobs = 1\n",
    "\n",
    "if running_jobs < max_running_jobs:  # Limiting to max. 1 Jobs\n",
    "    try:\n",
    "        sm.create_auto_ml_job(\n",
    "            AutoMLJobName=auto_ml_job_name,\n",
    "            InputDataConfig=input_data_config,\n",
    "            OutputDataConfig=output_data_config,\n",
    "            AutoMLJobConfig=job_config,\n",
    "            RoleArn=role,\n",
    "        )\n",
    "        print(\"[OK] Autopilot Job {} created.\".format(auto_ml_job_name))\n",
    "        running_jobs = running_jobs + 1\n",
    "    except:\n",
    "        print(\n",
    "            \"[INFO] You have already launched an Autopilot job. Please continue see the output of this job.\".format(\n",
    "                running_jobs\n",
    "            )\n",
    "        )\n",
    "else:\n",
    "    print(\n",
    "        \"[INFO] You have already launched {} Autopilot job(s). Please continue see the output of the running job.\".format(\n",
    "            running_jobs\n",
    "        )\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Track the Progress of the Autopilot Job\n",
    "\n",
    "SageMaker Autopilot job consists of the following high-level steps: \n",
    "* _Data Analysis_ where the data is summarized and analyzed to determine which feature engineering techniques, hyper-parameters, and models to explore.\n",
    "* _Feature Engineering_ where the data is scrubbed, balanced, combined, and split into train and validation.\n",
    "* _Model Training and Tuning_ where the top performing features, hyper-parameters, and models are selected and trained.\n",
    "\n",
    "<img src=\"img/autopilot-steps.png\" width=\"90%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Autopilot Research Paper:   \n",
    "https://assets.amazon.science/e8/8b/2366b1ab407990dec96e55ee5664/amazon-sagemaker-autopilot-a-white-box-automl-solution-at-scale.pdf**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Analyzing Data and Generate Notebooks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "\n",
    "while (\n",
    "    \"AutoMLJobStatus\" not in job_description_response.keys()\n",
    "    and \"AutoMLJobSecondaryStatus\" not in job_description_response.keys()\n",
    "):\n",
    "    job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    print(\"[INFO] Autopilot Job has not yet started. Please wait. \")\n",
    "    print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))\n",
    "    print(\"[INFO] Waiting for Autopilot Job to start...\")\n",
    "    sleep(15)\n",
    "\n",
    "print(\"[OK] AutoMLJob started.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Review the SageMaker `Processing Jobs`\n",
    "* First Processing Job (Data Splitter) checks the data sanity, performs stratified shuffling and splits the data into training and validation. \n",
    "* Second Processing Job (Candidate Generator) first streams through the data to compute statistics for the dataset. Then, uses these statistics to identify the problem type, and possible types of every column-predictor: numeric, categorical, natural language, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/\">Processing Jobs</a></b>'.format(\n",
    "            region\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Next Cell Will Show `InProgress` For A Few Minutes.\n",
    "\n",
    "## _Please be patient._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "job_sec_status = job_description_response[\"AutoMLJobSecondaryStatus\"]\n",
    "\n",
    "if job_status not in (\"Stopped\", \"Failed\"):\n",
    "    while job_status in (\"InProgress\") and job_sec_status in (\"Starting\", \"AnalyzingData\"):\n",
    "        job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "        job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "        job_sec_status = job_description_response[\"AutoMLJobSecondaryStatus\"]\n",
    "        print(job_status, job_sec_status)\n",
    "        sleep(15)\n",
    "    print(\"[OK] Data analysis phase completed.\\n\")\n",
    "\n",
    "print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# View Generated Notebook Samples\n",
    "Once data analysis is complete, SageMaker AutoPilot generates two notebooks: \n",
    "* Data Exploration\n",
    "* Candidate Definition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Waiting For Generated Notebooks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "\n",
    "while \"AutoMLJobArtifacts\" not in job_description_response.keys():\n",
    "    job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    print(\"[INFO] Autopilot Job has not yet generated the artifacts. Please wait. \")\n",
    "    print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))\n",
    "    print(\"[INFO] Waiting for AutoMLJobArtifacts...\")\n",
    "    sleep(15)\n",
    "\n",
    "print(\"[OK] AutoMLJobArtifacts generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "\n",
    "while \"DataExplorationNotebookLocation\" not in job_description_response[\"AutoMLJobArtifacts\"].keys():\n",
    "    job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    print(\"[INFO] Autopilot Job has not yet generated the notebooks. Please wait. \")\n",
    "    print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))\n",
    "    print(\"[INFO] Waiting for DataExplorationNotebookLocation...\")\n",
    "    sleep(15)\n",
    "\n",
    "print(\"[OK] DataExplorationNotebookLocation found.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "generated_resources = job_description_response[\"AutoMLJobArtifacts\"][\"DataExplorationNotebookLocation\"]\n",
    "download_path = generated_resources.rsplit(\"/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb\")[0]\n",
    "job_id = download_path.rsplit(\"/\", 1)[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "if not job_id:\n",
    "    print(\"No AutoMLJobArtifacts found.\")\n",
    "else:\n",
    "    display(\n",
    "        HTML(\n",
    "            '<b>Review <a target=\"blank\" href=\"https://s3.console.aws.amazon.com/s3/buckets/{}/{}/{}/sagemaker-automl-candidates/{}/\">S3 Generated Resources</a></b>'.format(\n",
    "                bucket, prefix_model_output, auto_ml_job_name, job_id\n",
    "            )\n",
    "        )\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download Generated Notebooks & Code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(download_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    !aws s3 cp --recursive $download_path .\n",
    "except:\n",
    "    print('Could not download the generated resources. Make sure the path is correct.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Review The Generated Resources:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls ./generated_module/candidate_data_processors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls ./notebooks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Feature Engineering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Watch out for SageMaker `Training Jobs` and `Batch Transform Jobs` to start. \n",
    "\n",
    "* This is the candidate exploration phase. \n",
    "* Each python script code for data-processing is executed inside a SageMaker framework container as a training job, followed by transform job.\n",
    "\n",
    "Note, that feature preprocessing part of each pipeline has all hyper parameters fixed, i.e. does not require tuning, thus feature preprocessing step can be done prior runing the hyper parameter optimization job. \n",
    "\n",
    "It outputs up to 10 variants of transformed data, therefore algorithms for each pipeline are set to use\n",
    "the respective transformed data.\n",
    "\n",
    "<img src=\"img/autopilot-steps.png\" width=\"90%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Autopilot Research Paper:   \n",
    "https://assets.amazon.science/e8/8b/2366b1ab407990dec96e55ee5664/amazon-sagemaker-autopilot-a-white-box-automl-solution-at-scale.pdf**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/\">Training Jobs</a></b>'.format(\n",
    "            region\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/transform-jobs/\">Batch Transform Jobs</a></b>'.format(\n",
    "            region\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Next Cell Will Show `InProgress` For A Few Minutes.\n",
    "\n",
    "## _Please be patient._ ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "job_sec_status = job_description_response[\"AutoMLJobSecondaryStatus\"]\n",
    "print(job_status)\n",
    "print(job_sec_status)\n",
    "if job_status not in (\"Stopped\", \"Failed\"):\n",
    "    while job_status in (\"InProgress\") and job_sec_status in (\"FeatureEngineering\"):\n",
    "        job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "        job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "        job_sec_status = job_description_response[\"AutoMLJobSecondaryStatus\"]\n",
    "        print(job_status, job_sec_status)\n",
    "        sleep(15)\n",
    "    print(\"[OK] Feature engineering phase completed.\\n\")\n",
    "\n",
    "print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# [INFO] _Feel free to continue to the next workshop section while this notebook is running._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model Training and Tuning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Watch out for a SageMaker`Hyperparameter Tuning Job` and various `Training Jobs` to start. \n",
    "\n",
    "* All algorithms are optimized using a SageMaker Hyperparameter Tuning job. \n",
    "* Up to 250 training jobs (based on number of candidates specified) are selectively executed to find the best candidate model.\n",
    "\n",
    "<img src=\"img/autopilot-steps.png\" width=\"90%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Autopilot Research Paper:   \n",
    "https://assets.amazon.science/e8/8b/2366b1ab407990dec96e55ee5664/amazon-sagemaker-autopilot-a-white-box-automl-solution-at-scale.pdf**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/\">Hyperparameter Tuning Jobs</a></b>'.format(\n",
    "            region\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/\">Training Jobs</a></b>'.format(\n",
    "            region\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Next Cell Will Show `InProgress` For A Few Minutes.\n",
    "\n",
    "## _Please be patient._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "job_sec_status = job_description_response[\"AutoMLJobSecondaryStatus\"]\n",
    "print(job_status)\n",
    "print(job_sec_status)\n",
    "if job_status not in (\"Stopped\", \"Failed\"):\n",
    "    while job_status in (\"InProgress\") and job_sec_status in (\"ModelTuning\"):\n",
    "        job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "        job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "        job_sec_status = job_description_response[\"AutoMLJobSecondaryStatus\"]\n",
    "        print(job_status, job_sec_status)\n",
    "        sleep(15)\n",
    "    print(\"[OK] Model tuning phase completed.\\n\")\n",
    "\n",
    "print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# _Please Wait Until ^^ Autopilot ^^ Completes Above_\n",
    "\n",
    "# [INFO] _Feel free to continue to the next workshop section while this notebook is running._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure the status below indicates `Completed`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "print(job_status)\n",
    "if job_status not in (\"Stopped\", \"Failed\"):\n",
    "    while job_status not in (\"Completed\"):\n",
    "        job_description_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "        job_status = job_description_response[\"AutoMLJobStatus\"]\n",
    "        print(job_status)\n",
    "        sleep(10)\n",
    "    print(\"[OK] Autopilot Job completed.\\n\")\n",
    "else:\n",
    "    print(job_status)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Viewing All Candidates\n",
    "Once model tuning is complete, you can view all the candidates (pipeline evaluations with different hyperparameter combinations) that were explored by AutoML and sort them by their final performance metric."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "candidates_response = sm.list_candidates_for_auto_ml_job(\n",
    "    AutoMLJobName=auto_ml_job_name, SortBy=\"FinalObjectiveMetricValue\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check that candidates exist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(candidates_response.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "while \"Candidates\" not in candidates_response.keys():\n",
    "    candidates_response = sm.list_candidates_for_auto_ml_job(\n",
    "        AutoMLJobName=auto_ml_job_name, SortBy=\"FinalObjectiveMetricValue\"\n",
    "    )\n",
    "    print(\"[INFO] Autopilot Job is generating the Candidates. Please wait.\")\n",
    "    print(json.dumps(candidates_response, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "candidates = candidates_response[\"Candidates\"]\n",
    "print(\"[OK] Candidates generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(candidates[0].keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "while \"CandidateName\" not in candidates[0]:\n",
    "    candidates_response = sm.list_candidates_for_auto_ml_job(\n",
    "        AutoMLJobName=auto_ml_job_name, SortBy=\"FinalObjectiveMetricValue\"\n",
    "    )\n",
    "    candidates = candidates_response[\"Candidates\"]\n",
    "    print(\"[INFO] Autopilot Job is generating CandidateName. Please wait. \")\n",
    "    print(json.dumps(candidates, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "print(\"[OK] CandidateName generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "while \"FinalAutoMLJobObjectiveMetric\" not in candidates[0]:\n",
    "    candidates_response = sm.list_candidates_for_auto_ml_job(\n",
    "        AutoMLJobName=auto_ml_job_name, SortBy=\"FinalObjectiveMetricValue\"\n",
    "    )\n",
    "    candidates = candidates_response[\"Candidates\"]\n",
    "    print(\"[INFO] Autopilot Job is generating FinalAutoMLJobObjectiveMetric. Please wait. \")\n",
    "    print(json.dumps(candidates, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "print(\"[OK] FinalAutoMLJobObjectiveMetric generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(json.dumps(candidates, indent=4, sort_keys=True, default=str))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for index, candidate in enumerate(candidates):\n",
    "    print(\n",
    "        str(index)\n",
    "        + \"  \"\n",
    "        + candidate[\"CandidateName\"]\n",
    "        + \"  \"\n",
    "        + str(candidate[\"FinalAutoMLJobObjectiveMetric\"][\"Value\"])\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inspect Trials using Experiments API\n",
    "\n",
    "SageMaker Autopilot automatically creates a new experiment, and pushes information for each trial. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.analytics import ExperimentAnalytics, TrainingJobAnalytics\n",
    "\n",
    "exp = ExperimentAnalytics(\n",
    "    sagemaker_session=sess,\n",
    "    experiment_name=auto_ml_job_name + \"-aws-auto-ml-job\",\n",
    ")\n",
    "\n",
    "df = exp.dataframe()\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Explore the Best Candidate\n",
    "Now that we have successfully completed the AutoML job on our dataset and visualized the trials, we can create a model from any of the trials with a single API call and then deploy that model for online or batch prediction using [Inference Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html). For this notebook, we deploy only the best performing trial for inference."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The best candidate is the one we're really interested in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(best_candidate_response.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "while \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate. Please wait. \")\n",
    "    print(json.dumps(best_candidate_response, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "print(\"[OK] BestCandidate generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(json.dumps(best_candidate_response, indent=4, sort_keys=True, default=str))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(best_candidate.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Doing BestCandidate check again because of Github issue https://github.com/data-science-on-aws/data-science-on-aws/issues/107\n",
    "while \"CandidateName\" not in best_candidate and \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate CandidateName. Please wait. \")\n",
    "    print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "print(\"[OK] BestCandidate CandidateName generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Doing BestCandidate check again because of Github issue https://github.com/data-science-on-aws/data-science-on-aws/issues/107\n",
    "while \"FinalAutoMLJobObjectiveMetric\" not in best_candidate and \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate FinalAutoMLJobObjectiveMetric. Please wait. \")\n",
    "    print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "print(\"[OK] BestCandidate FinalAutoMLJobObjectiveMetric generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_candidate_identifier = best_candidate[\"CandidateName\"]\n",
    "print(\"Candidate name: \" + best_candidate_identifier)\n",
    "print(\"Metric name: \" + best_candidate[\"FinalAutoMLJobObjectiveMetric\"][\"MetricName\"])\n",
    "print(\"Metric value: \" + str(best_candidate[\"FinalAutoMLJobObjectiveMetric\"][\"Value\"]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# View Individual Autopilot Jobs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Doing BestCandidate check again because of Github issue https://github.com/data-science-on-aws/data-science-on-aws/issues/107\n",
    "while \"CandidateSteps\" not in best_candidate and \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate CandidateSteps. Please wait. \")\n",
    "    print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "print(\"[OK] BestCandidate CandidateSteps generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Doing BestCandidate check again because of Github issue https://github.com/data-science-on-aws/data-science-on-aws/issues/107\n",
    "while \"CandidateStepType\" not in best_candidate[\"CandidateSteps\"][0] and \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate CandidateSteps CandidateStepType. Please wait. \")\n",
    "    print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "print(\"[OK] BestCandidate CandidateSteps CandidateStepType generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Doing BestCandidate check again because of Github issue https://github.com/data-science-on-aws/data-science-on-aws/issues/107\n",
    "while \"CandidateStepName\" not in best_candidate[\"CandidateSteps\"][0] and \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate CandidateSteps CandidateStepName. Please wait. \")\n",
    "    print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "print(\"[OK] BestCandidate CandidateSteps CandidateStepName generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_candidate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "steps = []\n",
    "for step in best_candidate[\"CandidateSteps\"]:\n",
    "    print(\"Candidate Step Type: {}\".format(step[\"CandidateStepType\"]))\n",
    "    print(\"Candidate Step Name: {}\".format(step[\"CandidateStepName\"]))\n",
    "    steps.append(step[\"CandidateStepName\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review Best Candidate <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/{}\">Processing Job</a></b>'.format(\n",
    "            region, steps[0]\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review Best Candidate <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}\">Training Job</a></b>'.format(\n",
    "            region, steps[1]\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review Best Candidate <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/transform-jobs/{}\">Transform Job</a></b>'.format(\n",
    "            region, steps[2]\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review Best Candidate <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}\">Training Job (Tuning)</a></b>'.format(\n",
    "            region, steps[3]\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review All <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/\">Processing Jobs</a></b>'.format(\n",
    "            region\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Review All Output in S3\n",
    "\n",
    "You will see the artifacts generated by Autopilot including the following:\n",
    "```\n",
    "data-processor-models/        # \"models\" learned to transform raw data into features \n",
    "documentation/                # explainability and other documentation about your model\n",
    "preprocessed-data/            # data for train and validation\n",
    "sagemaker-automl-candidates/  # candidate models which autopilot compares\n",
    "transformed-data/             # candidate-specific data for train and validation\n",
    "tuning/                       # candidate-specific tuning results\n",
    "validations/                  # validation results\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review All <a target=\"blank\" href=\"https://s3.console.aws.amazon.com/s3/buckets/{}?region={}&prefix=models/autopilot/{}/\">Output in S3</a></b>'.format(\n",
    "            bucket, region, auto_ml_job_name\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# See the Containers and Models within the Inference Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Doing BestCandidate check again because of Github issue https://github.com/data-science-on-aws/data-science-on-aws/issues/107\n",
    "while \"InferenceContainers\" not in best_candidate and \"BestCandidate\" not in best_candidate_response:\n",
    "    best_candidate_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    best_candidate = best_candidate_response[\"BestCandidate\"]\n",
    "    print(\"[INFO] Autopilot Job is generating BestCandidate InferenceContainers. Please wait. \")\n",
    "    print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))\n",
    "    sleep(10)\n",
    "\n",
    "print(\"[OK] BestCandidate InferenceContainers generated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_candidate_containers = best_candidate[\"InferenceContainers\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for container in best_candidate_containers:\n",
    "    print(container[\"Image\"])\n",
    "    print(container[\"ModelDataUrl\"])\n",
    "    print(\"======================\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Update Containers To Show Predicted Label and Confidence Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for container in best_candidate_containers:\n",
    "    print(container[\"Environment\"])\n",
    "    print(\"======================\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_candidate_containers[1][\"Environment\"].update({\"SAGEMAKER_INFERENCE_OUTPUT\": \"predicted_label, probability\"})\n",
    "best_candidate_containers[2][\"Environment\"].update({\"SAGEMAKER_INFERENCE_INPUT\": \"predicted_label, probability\"})\n",
    "best_candidate_containers[2][\"Environment\"].update({\"SAGEMAKER_INFERENCE_OUTPUT\": \"predicted_label, probability\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for container in best_candidate_containers:\n",
    "    print(container[\"Environment\"])\n",
    "    print(\"======================\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Autopilot Chooses XGBoost as Best Candidate!\n",
    "\n",
    "Note that Autopilot chose different hyper-parameters and feature transformations than we used in our own XGBoost model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploy the Model as a REST Endpoint\n",
    "Batch transformations are also supported, but for now, we will use a REST Endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(best_candidate[\"InferenceContainers\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r autopilot_model_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    autopilot_model_name\n",
    "except NameError:\n",
    "    timestamp_suffix = strftime(\"%d-%H-%M-%S\", gmtime())\n",
    "    autopilot_model_name = \"automl-dm-model-\" + timestamp_suffix\n",
    "    print(\"[OK] Created Autopilot Model Name: \" + autopilot_model_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store autopilot_model_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r autopilot_model_arn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    autopilot_model_arn\n",
    "except NameError:\n",
    "    create_model_response = sm.create_model(\n",
    "        Containers=best_candidate[\"InferenceContainers\"], ModelName=autopilot_model_name, ExecutionRoleArn=role\n",
    "    )\n",
    "    autopilot_model_arn = create_model_response[\"ModelArn\"]\n",
    "    print(\"[OK] Created Autopilot Model: {}\".format(autopilot_model_arn))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store autopilot_model_arn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Define EndpointConfig Name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "timestamp_suffix = strftime(\"%d-%H-%M-%S\", gmtime())\n",
    "epc_name = \"automl-dm-epc-\" + timestamp_suffix\n",
    "\n",
    "print(epc_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Define REST Endpoint Name for Autopilot Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r autopilot_endpoint_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "timestamp_suffix = strftime(\"%d-%H-%M-%S\", gmtime())\n",
    "\n",
    "try:\n",
    "    autopilot_endpoint_name\n",
    "except NameError:\n",
    "    autopilot_endpoint_name = \"automl-dm-ep-\" + timestamp_suffix\n",
    "    print(\"[OK] Created Autopilot Endpoint Name {}: \".format(autopilot_endpoint_name))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "variant_name = \"automl-dm-variant-\" + timestamp_suffix\n",
    "print(\"[OK] Created Endpoint Variant Name {}: \".format(variant_name))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store autopilot_endpoint_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ep_config = sm.create_endpoint_config(\n",
    "    EndpointConfigName=epc_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"InstanceType\": \"ml.m5.large\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"ModelName\": autopilot_model_name,\n",
    "            \"VariantName\": variant_name,\n",
    "        }\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r autopilot_endpoint_arn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    autopilot_endpoint_arn\n",
    "except NameError:\n",
    "    create_endpoint_response = sm.create_endpoint(EndpointName=autopilot_endpoint_name, EndpointConfigName=epc_name)\n",
    "    autopilot_endpoint_arn = create_endpoint_response[\"EndpointArn\"]\n",
    "    print(autopilot_endpoint_arn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store autopilot_endpoint_arn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(\n",
    "    HTML(\n",
    "        '<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}\">SageMaker REST Endpoint</a></b>'.format(\n",
    "            region, autopilot_endpoint_name\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Store Variables for the Next Notebooks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary\n",
    "We used Autopilot to automatically find the best model, hyper-parameters, and feature-engineering scripts for our dataset.  \n",
    "\n",
    "Autopilot uses a transparent approach to generate re-usable exploration Jupyter Notebooks and transformation Python scripts to continue to train and deploy our model on new data - well after this initial interaction with the Autopilot service."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Release Resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%html\n",
    "\n",
    "<p><b>Shutting down your kernel for this notebook to release resources.</b></p>\n",
    "<button class=\"sm-command-button\" data-commandlinker-command=\"kernelmenu:shutdown\" style=\"display:none;\">Shutdown Kernel</button>\n",
    "        \n",
    "<script>\n",
    "try {\n",
    "    els = document.getElementsByClassName(\"sm-command-button\");\n",
    "    els[0].click();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}    \n",
    "</script>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%javascript\n",
    "\n",
    "try {\n",
    "    Jupyter.notebook.save_checkpoint();\n",
    "    Jupyter.notebook.session.delete();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
